Group Project: Kickstarter Campaign

Kickstarter is a company that provides the space for independent artists, creative, innovators, and entrepreneurs to bring their unique projects to life. Kickstarter allows anyone to financially support the project through an online pledging system - this means; anyone can pledge a specific amount of money towards the funding goal of the project. Kickstarter projects (also called campaigns) are all-or-nothing -- meaning, if the funding goal isn't met 100% (or exceeded), the project campaign fails and no funding is provided. The stakes are high - but what is it about certain campaigns that make them succeed while others fail?

You have joined the marketing team for Kickstarter and you are tasked with exploring the features of several campaigns over the past few years. You are responsible for looking at this dataset and pulling out key insights about the characteristics of Kickstarter campaigns that make them more likely to succeed or fail. The marketing team at Kickstarter has a limited amount of funds to devote to highlighting specific projects, and they want to highlight projects that have the best chance of succeeding (i.e. meeting the funding goal). Can you help them determine who that might be?

Complete the steps below to take a dataset from inception to insights, for the purpose of answering the following two questions:

1. What kinds of projects should the Kickstarter Marking Team focus their attention on?
2. Why should the Marketing Team focus their attention in that direction?

After you have combed through the data, your group will put together a presentation (including visualizations), that clearly answers the two questions above. The Kickstarter dataset is posted to Canvas. Complete the code in the notebook below to complete the project. You are NOT restricted to the steps laid out in this notebook to complete your project, You can conduct additional analyses or create additional visualizations. This notebook is everything you need to cover, but feel free to expand on these steps! This includes completing regression analyses with the dataset.

Part 1: Domain Knowledge

Before you begin looking at the data, you need to expand your knowledge of the subject matter. Start by visiting www.kickstarter.com and read all you can about the company. What is the objective of the company? How does it work? What does the project timeline look like? You should research the company until you feel comfortable speaking about the basics of Kickstarter projects. Use the space below (double-click the cell to activate) to write a brief paragraph about what you learned about Kickstarter.

Questions to Answer

  1. What is Kickstarter?
  2. What is the purpose of Kickstarter?
  3. Who are the "backers" of a Kickstarter campaign?
  4. How is success determined for a project campaign?

KICKSTARTER INFORMATION

In a nutshell:

Kickstarter is an online funding platform designed as a connection between creativity and capital. Its purpose is to break free from a central focus on profit at all costs. The "backers" are individuals or groups that pledge money if a project gets fully funded. Success is determined as a project meeting its funding goal within its timeline.

Part 2: Data Import and Cleaning

Now that you are familiar with where the data is coming from, you are ready to start examining the data. The Kickstarter Dataset is a collection of project campaign information from 2016. Data includes information about the project name, length, country of origin, goal, and the amount of money raised, etc. If you open the Kickstarter Dataset in excel, the second tab provides descriptions of each of the columns/variables. Import the dataset into this notebook and follow the steps below to gather information about the data and to clean up the dataset. Use the space below (double-click the cell to activate) to write a brief paragraph describing the dataset and the steps you took while cleaning the data.

Questions to Answer

  1. How many columns are in the dataset? How many rows?
  2. What type of variables (continuous, categorical) make up this dataset?
  3. Which variables have missing values?
  4. How did you handle the missing data in the dataset?
  5. How many rows are in your dataset after handling missing data?
  6. Are there any odd or inappropriate values within a column that don't make sense? If so, what are they?
  7. What percentage of the each project was funded? In other words, how close were they to reaching the goal? You will need to create a new variable to answer this question.

DATASET INFORMATION AND CLEANING

Answers below:

  1. 281,856 rows and 14 columns
  2. All types of variables - integer, decimal, date/time, and categorical
  3. missing variables in the following categories: name, usd pledged
  4. a project without a name is still a project, so just changed the name to unknown; a project without pledges, though, is neither a success nor a failure and therefore useless to include in dataset (dropped)
  5. 281,646 rows after handling missing data
  6. I found no odd or inappropriate values in any of the columns, including usd pledged, which has no negative values
  7. done

Part 3: Exploratory Analysis

With a clean dataset, you are now ready to start exploring the variables in your dataset. Don't worry about how your variables relate to each other - we will cover that in the next section. For now, it's more important that you get a clear sense of the variable characteristics on their own. Follow the steps below to explore all of the variables within your dataset and preform descriptive statistics. In addition to the descriptive statistics, you are tasked with creating a visualizations related to your results. Stylistic choices related to the visualizations is up to your group. Use the space below (double-click the cell to activate) to write a brief paragraph describing the steps you took to explore the data.

Questions to Answer

  1. What is the average (mean) for the following variables: goal, usd pledged, backers, and length?
  2. What is the maximum value, minimum value, and range for the following variables: goal, usd pledged, backers, and length?
  3. What is the most common (mode) length for campaign projects?
  4. Considering the categorical variables, what is the most frequent main category group? How many projects are classified under this category? What is the most frequent sub-category? How many projects are classified under this category?
  5. Considering the categorical variables, what is the least frequent main category group? How many projects are classified under this category? What is the least frequent sub-category? How many projects are classified under this category?
  6. Which country has started the most Kickstarter campaigns?
  7. How many projects have failed? How many projects have succeeded?

Visualizations to Create


EXPLORATORY DATA ANALYSIS

Start your paragraph here . . . Be sure to answer all questions in this space!

Part 4: Variable Relationships

It's time to explore the relationships between variables and answer some of the critical questions for the project. Your dependent/outcome variable is STATE - this is the variable that captures if the project was successful or not. In addition to exploring the relationships between your other independent variables, you want to pay close attention to the relationship between your independent variables and state. Follow the steps below to explore the relationships between your variables. In addition, you are tasked with creating visualizations related to your results. Stylistic choices related to the visualizations is up to your group. Use the space below (double-click the cell to activate) to write a brief paragraph describing the steps you took to determine variable relationships.

Questions to Answer

  1. How correlated are the numeric variables within this dataset? Create a correlation matrix to find out. Is anything highly correlated?
  2. What is the average amount of money pledged across each of the main categories? What about across the following: sub-category, country, currency, and state? Which main category is the most profitable?
  3. What is the average number of backers across each of the main categories? What about across the following: sub-category, country, currency, and state? Which main category is the most popular?
  4. What percentage of projects succeed and fail across each of the main categories?
  5. Which of the main categories have the highest success rate (top 3)? Which of the sub-categories have the highest success rate (top 3)?
  6. What is the average duration of a campaign for projects that succeed? What is the average duration for projects that fail?
  7. What is the average funding goal of a campaign for projects that succeed? What is the average funding goal for projects that fail?

Visualizations to Create


RELATIONSHIPS BETWEEN VARIABLES

Start your paragraph here . . . Be sure to answer all questions in this space!

Part 5: Group Presentation

By now, your group should feel very comfortable with the aspects of the Kickstarter dataset. You should have a firm understanding of what Kickstarter is, what the dataset contains, the characteristics of each variable, how the variables interact with each other, and finally, which variables influence the outcome of the Kickstarter campaign. Can you put all this information together to tell a story about the data? Your presentation should include visualizations and clear answers to the two primary questions:

1. What kinds of projects should the Kickstarter Marking Team focus their attention on?
2. Why should the Marketing Team focus their attention in that direction?

In addition to these two questions, your presentation should cover the additional questions listed below. These questions should be easy to answer using the information you discovered above. When you are done, submit your completed notebook to me.

Questions to Answer

  1. What main category is the most profitable (highest amount of money pledged)?
  2. Which main category is the most popular (highest number of backers)?
  3. Which sub-categories are the most profitable and popular (top 3)?
  4. What are some of the characteristics of a successful Kickstarter campaign?
  5. How does the success or failure of Kickstarter campaigns differ between main category, sub-category, country, and length of campaign? Please mention only the most notable – you do not need to detail the success/failure rate for each category/sub-category?

TIPS FOR A GREAT PRESENTATION